منابع مشابه
Statistical significance of ungapped sequence alignments.
Statistical significance of a local sequence alignment depends not only on the similarity score and on the sequence lengths, but also on a length of the alignment. Dependence of the alignment significance on the length of the sequences has been analyzed earlier, and is based on the idea that the longer sequences have more chances to share a local similarity with a bigger score. To the best of o...
متن کاملEstimating statistical significance of sequence alignments.
Algorithms that compare two proteins or DNA sequences and produce an alignment of the best matching segments are widely used in molecular biology. These algorithms produce scores that when comparing random sequences of length n grow proportional to n or to log(n) depending on the algorithm parameters. The Azuma-Hoeffding inequality gives an upper bound on the probability of large deviations of ...
متن کاملApproximate Statistics of Gapped Alignments
A heuristic approximation to the score distribution of gapped alignments in the logarithmic domain is presented. The method applies to comparisons between random, unrelated protein sequences, using standard score matrices and arbitrary gap penalties. It is shown that gapped alignment behavior is essentially governed by a single parameter, alpha, depending on the penalty scheme and sequence comp...
متن کاملRobust E-Values for Gapped Local Alignments
We examine a Poisson heuristic for judging the significance of local sequence alignments with gaps. Model parameters are estimated directly from the sequences to be aligned, so that laborious prior simulation studies or database comparisons for the estimation of parameters describing the connection between score and E-value are unnecessary. Simulation studies give evidence that this method give...
متن کاملAccurate formula for P-values of gapped local sequence and profile alignments.
A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (i.e. gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computational Biology
سال: 2008
ISSN: 1066-5277,1557-8666
DOI: 10.1089/cmb.2008.0125